AFLEX

Section: User Commands (1)
Updated: 1 September 1990
Index Return to Main Contents
 

NAME

aflex - fast lexical analyzer generator for Ada  

SYNOPSIS

aflex [ -bdfipstvILT -Sskeleton_file ] [ filename ]  

DESCRIPTION

aflex is a version of the Unix tool lex , but it is written in Ada and generates scanners in Ada. It is upwardly compatible with the UCI tool alex, but is much faster and generates smaller scanners.  

OPTIONS

Command line options are given in a different format than in the old UCI alex. Aflex options are as follows
-t
Write the scanner output to the standard output rather than to a file. The default name of the scanner file for base.l is base.a Note that this option is not as useful with aflex because in addition to the scanner file there are files for the externally visible dfa functions (base_dfa.a) and the external IO functions (base_io.a)
-b
Generate backtracking information to aflex.backtrack. This is a list of scanner states which require backtracking and the input characters on which they do so. By adding rules one can remove backtracking states. If all backtracking states are eliminated and -f is used, the generated scanner will run faster (see the -p flag). Only users who wish to squeeze every last cycle out of their scanners need worry about this option.
-d
makes the generated scanner run in debug mode. Whenever a pattern is recognized the scanner will write to stderr a line of the form:
    --accepting rule #n

Rules are numbered sequentially with the first one being 1. Rule #0 is executed when the scanner backtracks; Rule #(n+1) (where n is the number of rules) indicates the default action; Rule #(n+2) indicates that the input buffer is empty and needs to be refilled and then the scan restarted. Rules beyond (n+2) are end-of-file actions.
-f
has the same effect as lex's -f flag (do not compress the scanner tables); the mnemonic changes from fast compilation to (take your pick) full table or fast scanner. The actual compilation takes longer, since aflex is I/O bound writing out the big table. The compilation of the Ada file containing the scanner is also likely to take a long time because of the large arrays generated.
-i
instructs aflex to generate a case-insensitive scanner. The case of letters given in the aflex input patterns will be ignored, and the rules will be matched regardless of case. The matched text given in yytext will have the preserved case (i.e., it will not be folded).
-p
generates a performance report to stderr. The report consists of comments regarding features of the aflex input file which will cause a loss of performance in the resulting scanner. Note that the use of the ^ operator and the -I flag entail minor performance penalties.
-s
causes the default rule (that unmatched scanner input is echoed to stdout) to be suppressed. If the scanner encounters input that does not match any of its rules, it aborts with an error. This option is useful for finding holes in a scanner's rule set.
-v
has the same meaning as for lex (print to stderr a summary of statistics of the generated scanner). Many more statistics are printed, though, and the summary spans several lines. Most of the statistics are meaningless to the casual aflex user, but the first line identifies the version of aflex, which is useful for figuring out where you stand with respect to patches and new releases.
-I
instructs aflex to generate an interactive scanner. Normally, scanners generated by aflex always look ahead one character before deciding that a rule has been matched. At the cost of some scanning overhead, aflex will generate a scanner which only looks ahead when needed. Such scanners are called interactive because if you want to write a scanner for an interactive system such as a command shell, you will probably want the user's input to be terminated with a newline, and without -I the user will have to type a character in addition to the newline in order to have the newline recognized. This leads to dreadful interactive performance.
If all this seems to confusing, here's the general rule: if a human will be typing in input to your scanner, use -I, otherwise don't; if you don't care about how fast your scanners run and don't want to make any assumptions about the input to your scanner, always use -I.
Note, -I cannot be used in conjunction with full i.e., the -f flag.
-L
instructs aflex to not generate #line directives (see below).
-T
makes aflex run in trace mode. It will generate a lot of messages to stdout concerning the form of the input and the resultant non-deterministic and deterministic finite automatons. This option is mostly for use in maintaining aflex.
-Sskeleton_file
overrides the default internal skeleton from which aflex constructs its scanners. You'll probably never need this option unless you are doing aflex maintenance or development.
 

INCOMPATIBILITIES WITH LEX

aflex is fully compatible with lex with the following exceptions:
-
Source file format:

The input specification file for aflex must use the following format.

               definitions section

                %%

                rules section

                %%

                user defined section

                ##

                user defined section

-
lex's %r (Ratfor scanners) and %t (translation table) options are not supported.
-
The do-nothing -n flag is not supported.
-
When definitions are expanded, aflex encloses them in parentheses. With lex, the following
    NAME    [A-Z][A-Z0-9]*
    %%
    foo{NAME}?      text_io.put_line( "Found it" );
    %%

will not match the string "foo" because when the macro is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?" and the precedence is such that the '?' is associated with "[A-Z0-9]*". With aflex, the rule will be expanded to "foo([A-z][A-Z0-9]*)?" and so the string "foo" will match. Note that because of this, the ^, $, <s>, and / operators cannot be used in a definition.
-
Input can be controlled by redefining the YY_INPUT function. YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its action is to place up to max_size characters in the character buffer "buf" and return in the integer variable "result" either the number of characters read or the constant YY_NULL to indicate EOF. The default YY_INPUT reads from Standard_Input.

You also can add in things like counting keeping track of the input line number this way; but don't expect your scanner to go very fast.

-
Yytext is a function returning a vstring.
-
aflex reads only one input file, while lex's input is made up of the concatenation of its input files.
-
The following lex constructs are not supported - REJECT

- %T   -- character set tables

- %x   -- changes to internal array sizes (see below)

 

ENHANCEMENTS

-
Exclusive start-conditions can be declared by using %x instead of %s. These start-conditions have the property that when they are active, no other rules are active. Thus a set of rules governed by the same exclusive start condition describe a scanner which is independent of any of the other rules in the aflex input. This feature makes it easy to specify "mini-scanners" which scan portions of the input that are syntactically different from the rest (e.g., comments). End-of-file rules. The special rule "<<EOF>>" indicates actions which are to be taken when an end-of-file is encountered and yywrap() returns non-zero (i.e., indicates no further files to process). The action can either text_io.set_input() to a new file to process, in which case the action should finish with YY_NEW_FILE (this is a branch, so subsequent code in the action won't be executed), or it should finish with a return statement. <<EOF>> rules may not be used with other patterns; they may only be qualified with a list of start conditions. If an unqualified <<EOF>> rule is given, it applies only to the INITIAL start condition, and not to %s start conditions. These rules are useful for catching things like unclosed comments. An example:
    %x quote
    %%
    ...
    <quote><<EOF>>   {
             error( "unterminated quote" );
             }
    <<EOF>>          {
             set_input( next_file );
             YY_NEW_FILE;
             }

-
aflex dynamically resizes its internal tables, so directives like "%a 3000" are not needed when specifying large scanners.
-
aflex generates --#line comments mapping lines in the output to their origin in the input file.
-
All actions must be enclosed by curly braces.
-
Comments may be put in the first section of the input by preceding them with '#'.
-
Ada style comments are supported instead of C style comments.
-
All template files are internalized.
-
The input source file must end with a ".l" extension.
 

FILES

The names of the files containing the generated scanner, IO,
and DFA packages are based on the basename of the input file. For example if the input file is called scan.l then the scanner file is called scan.a, the DFA package is in scan_dfa.a, and scan_io.a is the IO package file. All of these file names may be changed by modifying the external_file_manager package (see the porting notes for more information.)
aflex.backtrack
backtracking information for -b
 

SEE ALSO

lex(1)

M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator. Technical Report Computing Science Technical Report, 39, Bell Telephone Laboratories, Murray Hill, NJ, 1975.

Military Standard Ada Programming Language         (ANSI/MIL-STD-1815A-1983),
American National Standards Institute, January 1983.

T. Nguyen and K. Forester, Alex - An Ada Lexical Analysis Generator Arcadia Document UCI-88-17, University of California, Irvine, 1988

D. Taback and D. Tolani, Ayacc User's Manual, Arcadia Document UCI-85-10, University of California, Irvine, 1986  

AUTHOR

John Self. Based on the tool flex written and designed by Vern Paxson. It reimplements the functionality of the tool alex designed by Thieu Q. Nguyen.

Send requests for aflex information to alex-info@ics.uci.edu
Send bug reports for aflex to alex-bugs@ics.uci.edu
 

DIAGNOSTICS

aflex scanner jammed - a scanner compiled with -s has encountered an input string which wasn't matched by any of its rules.

old-style lex command ignored - the aflex input contains a lex command (e.g., "%n 1000") which is being ignored.  

BUGS

Some trailing context patterns cannot be properly matched and generate warning messages ("Dangerous trailing context"). These are patterns where the ending of the first part of the rule matches the beginning of the second part, such as "zx*/xy*", where the 'x*' matches the 'x' at the beginning of the trailing context. (Lex doesn't get these patterns right either.)

variable trailing context (where both the leading and trailing parts do not have a fixed length) entails a substantial performance loss.

For some trailing context rules, parts which are actually fixed-length are not recognized as such, leading to the abovementioned performance loss. In particular, parts using '|' or {n} are always considered variable-length.

Nulls are not allowed in aflex inputs or in the inputs to scanners generated by aflex. Their presence generates fatal errors.

Pushing back definitions enclosed in ()'s can result in nasty, difficult-to-understand problems like:

        {DIG}  [0-9] -- a digit

In which the pushed-back text is "([0-9] -- a digit)".

Due to both buffering of input and read-ahead, you cannot intermix calls to text_io routines, such as, for example, text_io.get() with aflex rules and expect it to work. Call input() instead.

There are still more features that could be implemented (especially REJECT) Also the speed of the compressed scanners could be improved.

The utility needs more complete documentation.


 

Index

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
INCOMPATIBILITIES WITH LEX
ENHANCEMENTS
FILES
SEE ALSO
AUTHOR
DIAGNOSTICS
BUGS

This document was created by man2html, using the manual pages.
Time: 00:42:10 GMT, March 30, 2022